Search CORE

25 research outputs found

Unsupervised ensemble minority clustering

Author: González Pellicer Edgar
Turmo Borras Jorge
Publication venue
Publication date: 01/01/2012
Field of study

Cluster a alysis lies at the core of most unsupervised learning tasks. However, the majority of clustering algorithms depend on the all-in assumption, in which all objects belong to some cluster, and perform poorly on minority clustering tasks, in which a small fraction of signal data stands against a majority of noise. The approaches proposed so far for minority clustering are supervised: they require the number and distribution of the foreground and background clusters. In supervised learning and all-in clustering, combination methods have been successfully applied to obtain distribution-free learners, even from the output of weak individual algorithms. In this report, we present a novel ensemble minority clustering algorithm, Ewocs, suitable for weak clustering combination, and provide a theoretical proof of its properties under a loose set of constraints. The validity of the assumptions used in the proof is empirically assessed using a collection of synthetic datasets.Preprin

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Non-parametric document clustering by ensemble methods

Author: González Pellicer Edgar
Turmo Borras Jorge
Publication venue
Publication date: 01/01/2008
Field of study

Los sesgos de los algoritmos individuales para clustering no paramétrico de documentos pueden conducir a soluciones no óptimas. Los métodos de consenso podrían compensar esta limitación, pero no han sido probados sobre colecciones de documentos. Este artículo presenta una comparación de estrategias para clustering no paramétrico de documentos por consenso. / The biases of individual algorithms for non-parametric document clustering can lead to non-optimal solutions. Ensemble clustering methods may overcome this limitation, but have not been applied to document collections. This paper presents a comparison of strategies for non-parametric document ensemble clustering.Peer ReviewedPostprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Unsupervised document clustering by weighted combination

Author: González Pellicer Edgar
Turmo Borras Jorge
Publication venue
Publication date: 01/01/2006
Field of study

This report proposes a novel unsupervised document clustering approach based on weighted combination of individual clusterings. Two non-weighted combination methods are adapted to work in a weighted fashion: a graph based method and a probability based one. The performance of the weighted approach is evaluated on real-world collections, and compared to that of individual clustering and non-weighted combination. The results of this evaluation confirm that graph based weighted combination consistently outperforms the other approaches.Postprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

ParTes. Test suite for parsing evaluation

Author: Castellón Masalles Irene
González Pellicer Edgar
Lloberes Salvatella Marina
Padró Lluís
Publication venue
Publication date: 01/01/2014
Field of study

This paper presents ParTes, the first test suite in Spanish and Catalan for parsing qualitative evaluation. This resource is a hierarchical test suite of the representative syntactic structure and argument order phenomena. ParTes proposes a simplification of the qualitative evaluation by contributing to the automatization of this task. © 2014 Sociedad Española para el Procesamiento del Lenguaje Natural.Postprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

TALP-UPC at TREC 2005: Experiments using voting scheme among three heterogeneous QA systems

Author: Ageno Pulido Alicia
Ferrés Domènech Daniel
Fuentes Fort Maria
González Pellicer Edgar
Kanaan Izquierdo Samir
Rodríguez Hontoria Horacio
Surdeanu Mihai
Turmo Borras Jorge
Publication venue
Publication date: 01/01/2005
Field of study

This paper describes the experiments of the TALP-UPC group for factoid and ’other’ (definitional) questions at TREC 2005 Main Question Answering (QA)task. Our current approach for factoid questions is based on a voting scheme among three QA systems: TALP-QA (our previous QA system), Sibyl (a new QA system developed at DAMA-UPC and TALP-UPC), and Aranea (a web-based data-driven approach). For defitional questions, we used two different systems: the TALP-QA Definitional system and LCSUM (a Summarization-based system). Our results for factoid questions indicate that the voting strategy improves the accuracy from 7.5% to 17.1%. While these numbers are low (due to technical problems in the Answer Extraction phase of TALP-QA system) they indicate that voting is a succesful approach for performance boosting of QA systems. The answer to definitional questions is produced by selecting phrases using set of patterns associated with definitions. Its results are 17.2% of F-score in the best configuration of TALP-QA Definitional system.Postprint (published version

UPCommons. Portal del coneixement obert de la UPC

The TALP participation at TAC-KBP 2012

Author: Ageno Pulido Alicia
Comas Umbert Pere Ramon
González Pellicer Edgar
Martí Maria Antònia
Mehdizadeh Naderi Ali
Rodríguez Hontoria Horacio
Sapena Masip Emilio
Turmo Borras Jorge
Vila Rigat Marta
Publication venue
Publication date: 01/01/2012
Field of study

This document describes the work performed by the Universitat Politècnica de Catalunya (UPC) in its first participation at TAC-KBP 2012 in both the Entity Linking and the Slot Filling tasks.Peer ReviewedPostprint (author’s final draft

UPCommons. Portal del coneixement obert de la UPC